A Dynamic Data Structure for Checking Hyperacyclicity 


Percy Liang Nathan Srebro 


MIT Computer Science and Artificial Intelligence Laboratory 
Cambridge, MA 02139, USA 
{pliang,nati}@mit.edu 


Abstract 


We present a dynamic data structure that keeps track of an acyclic hypergraph (equivalently, a tri- 
angulated graph) and enables verifying that adding a candidate hyperedge (clique) will not break the 
acyclicity of the augmented hypergraph. This is a generalization of the use of Tarjan’s Union-Find data 
structure for maintaining acyclicity when augmenting forests, and the amortized time per operation has 
a similar almost-constant dependence on the size of the hypergraph. Such a data structure is useful when 
augmenting acyclic hypergraphs, e.g.in order to greedily construct a high-weight acyclic hypergraph. In 
designing this data structure, we introduce a hierarchical decomposition of acyclic hypergraphs that aid 
in understanding hyper-connectivity, and introduce a novel concept of a hypercycle which is excluded 
from acyclic hypergraphs. 


1 Introduction 


Acyclic hypergraphs, or hyperforests (such as the one in Figure 1(a)), are a natural generalization of forests. 
They have been independently, and equivalently, defined in many different domains, and are also studied 
as triangulated graphs (hyperforests are those hypergraphs formed by the cliques of triangulated graphs). 
Hyperforests are useful in many domains where higher-order relations are to be captured, but certain tree- 
like “acyclic” properties are desired. 


Figure 1: An example of a 2-hyperforest. 


Acyclicity can allow many calculations to be carried out efficiently using dynamic programming. Such 
calculations include a broad class of combinatorial problems [Cou90] as well as inference in graphical 
models [Bes74]. In such applications the computation is often exponential in the width of the hypergraph, 
which corresponds to the maximum size of the hyperedges (or cliques in a triangulated graph). The class of 
K-hyperforests (width at most K) is then of particular interest. 

When a K-hyperforest is necessary, one often wishes to choose the best possible K-hyperforest, where 
the quality of a hyperforest is captured as the sum of precomputed weights over its hyperedges, leading to 
the problem of finding a maximum-weight A-hyperforest [KSO1]. Such a procedure is common in several 
different domains, for the special case where K = 1 and one seeks a maximum-weight tree: maximum 


likelihood Markov trees, known as Chow-Liu trees [CL68]; Hunter-Worsley trees for Bonferroni inequal- 
ities [Wor82]; and when trees are used to ensure efficient combinatorial optimization e.g. [Mat99]. Gen- 
eralizations to higher width hyperforests are possible, and desirable, and have recently been investigated 
[Mal91, SreO1, BPO1, Tom86]. 

Unfortunately, when K > 1 finding the maximum weight K-hyperforest is NP-complete, and finding 
good approximation algorithms remains an open problem. The common heuristic approach is a Prim-like 
greedy approach, maintaining a fully connected hypertree, and adding to it only single vertices [Mal91, 
BPO1, BJO2]. Alternatively, one might consider a more flexible, and possibly more powerful, Kruskal- 
like greedy approach, adding hyperedges to a possibly unconnected K-hyperforest. In order to do so, it is 
necessary to ensure that a new hyperedge about to be added does not break the acyclicity. 

A particular situation where the Kruskal-like approach is necessary is when we would like to greedily 
augment an initial, possibly unconnected, hyperforest. This might be a required, or strongly desirable, 
substructure, or a high-weight substructure found by global search techniques (it is possible to efficiently 
find hyperforests containing at least a constant fraction of the optimal weight [KSO1)]). 

Acyclicity is also important in order to preclude possible conflicts in, e.g. relational databases [BFMY83], 
or when learning graphical models [Bes74]. In such applications, one might want to ensure that new rela- 
tions added, e.g. to a database scheme, do not break its acyclicity. 

When augmenting forests, Tarjan’s Union-Find dynamic data structure enables checking efficiently if a 
new edge breaks the acyclicity, by keeping track of the connected components in the graph. The main result 
in this paper is a dynamic data structure that serves a purpose analogous to Tarjan’s Union-Find structure 
in hyperforests: for any candidate hyperedge, the data structure enables verifying that adding the hyperedge 
will not break the acyclicity of the augmented hyperforest. The amortized time per operation is almost 
independent of the hypergraph size (dependent through the inverse of Ackarman’s function). 

We show how in hyperforests, it is no longer enough to consider a single type of connectedness. Thus, 
the simple notion of connected components, which can be captured using a single Union-Find structure, is 
not enough. Instead, we present a novel view of hyperforests, at different levels, each highlighting a different 
degree of connectivity, and use a separate Union-Find structure for each level. 

On the way to developing such a data structure, we also suggest the notion of a hypercycle. Although 
acyclic hypergraphs have been studied for the past three decades, using many equivalent characterizations, 
we are not aware of any characterization that directly defines a hypercycle and characterizes acyclic hyper- 
graphs as those that to do not have hypercycles. Such a characterization, which we give in Definition 9 and 
Theorem 12, provides added insight into hyperforests. 

The rest of this paper is organized as follows: in Section 2 we define hyperforests and specify the 
desired data structure. In Section 3 we examine the concepts of hyperconnectivity and hypercycles and lay 
the foundations for the proof of the data structure, which is presented in Section 4. Finally, in Section 5 we 
discuss the problem of finding maximum weight hypertrees and the utility of Kruskal-like greedy approaches 
over Prim-like approaches. 


2 Hypergraph Acyclicity 


Preliminaries A hypergraph H(V) is a collection of subsets, or hyperedges, of the vertex set V: H(V) C 
2”. If hk’ Ch € H then the hyperedge h’ is covered by H. Of particular interest are the maximal hyperedges 
of a hypergraph H, which are not covered by any other hyperedges in H—in fact, in this paper we refer to H 
as containing only such maximal hyperedges, while denoting by H the collection of all covered hyperedges: 
He ={hCV|Anegh Ch'}. We say that a hypergraph Hy covers Hy if Hy C My. 

The projection of a hypergraph H onto a set of vertices s is H, = {hN s\h € H}. 


Several equivalent definitions of hypergraph acyclicity are in common use (see [Sre00] for a review). 
Here, we define acyclicity using the notion of a tree structure: 


Definition 1. A hypergraph H is said to have a tree structure T(H1) iff T is a tree over all the hyperedges of 
Hi and the following path overlap property holds: If (hi, ha,... , hm) is a path of H-hyperedges in T, then 
Vicicm hiNhm © hi. 


Definition 2. A hypergraph is acyclic iff it has a tree structure. An acyclic hypergraph is also referred to 
as a hyperforest. We say that a hyperforest has width (at most) K, and refer to it as a K-hyperforest, if its 
hyperedges are of size at most K +1. 


Problem Statement: Checking Acyclicity We seek a data structure that will allow us to augment a 
hyperforest by adding hyperedges to it, ensuring that it remains acyclic. That is, the data structure should 
keep track of the “current” hyperforest H and support two operations, where hyew is a hyperedge: 


QUERY (hpew) returns TRUE iff H U {Rnew } is acyclic. 


INSERT(Apew) augments H + H U {fnew}, assuming that it is acyclic. 


Bounded Tree-Width Hypergraphs Unlike forests, hyperforests do not form a monotone family of hy- 
pergraphs: a sub-hypergraph of an acyclic hypergraph might be cyclic, and conversely, adding hyperedges 
to a cyclic hypergraph might make it acyclic. When a tree structure is used in order to perform efficient 
computation using dynamic programming (e.g.inference in graphical models), it is often admissible to add 
extra hyperedges to a cyclic hypergraph in order to obtain a covering hyperforest. Computation is then per- 
formed using the tree structure of this covering hyperforest. The important requirement is that the width of 
the covering hyperforest be small, as computation is exponential in this width: 


Definition 3. The tree-width of a hypergraph H is the minimum width of a hyperforest that covers H. 


Accordingly, in such situations, one might want a have data structure that checks whether adding a 
hyperedge maintains low tree-width. If all hyperedges added are of size at most K + 1, then the data 
structure presented here ensures a tree-width of not more than K. The converse is not true: a hyperedge 
Rnew Might be refused even though H U {hnew } has tree-width at most K. 

Before considering dynamic data structures for maintaining low tree-width, it is important to remember 
that calculating the tree-width statically, or equivalently finding a narrow triangulation, is by itself a very 
difficult task. Although linear time algorithms for constant width have recently been discovered [Bod96], the 
dependence on the width is extremely prohibitive and these algorithms are not usable in practice. Instead, 
various heuristics, approximation algorithms, and super-polynomial-time algorithms are used [SG97]. 


Augmentation in Greedy Algorithms Our main motivation for the dynamic data structure stems from 
greedy Kruskal-like construction of high-weight hyperforests, particularly in order to find maximum like- 
lihood Markov networks of bounded tree-width. If the weights on hyperedges are all non-negative, it may 
be appropriate to allow bounded tree-width hypergraphs in intermediate stages, requiring a dynamic data 
structure that checks tree-width rather then acyclicity. However, in situations in which weights might be 
negative or positive, as in the case when the weight of a hypergraph corresponds to its likelihood [Sre01], 
we cannot allow intermediate cyclic hypergraphs, as making them acyclic might introduce high negative 
weights (the weight of a cyclic hypergraph does not correspond to its likelihood). For such applications, the 
acyclicity is the correct property to require (along with ensuring each added hyperedge is of proper size). 


3 A new look at hyperforests 


3.1 Hyperconnectivity 


Connectivity in hyperforests is more substantially complex than connectivity in forests. In forests, two edges 
are either incident or disjoint. In a K-hyperforest, there are K + 1 “degrees” at which two hyperedges can 
overlap, corresponding to overlap sizes ranging from 0 to K’. We suggest a hierarchical decomposition of a 
hyperforest into superedges, which allows us to concentrate on one degree of connectivity at a time. 


Definition 4 (Superedge). Two hyperedges h, and hm are k-connected' if there exists a sequence of hy- 
peredges hy,..., Rm such that |hy Nhi4i1| > k for alll <i < m-—1. A k-superedge gq of a hyperforest H 
is a maximal (k + 1)-connected subset of H. 

Denote the set of all vertices in the union of all the hyperedges in a superedge q as ¢ = Ug. 


Definition 5 (Overlap of superedges). The overlap between two superedges q, and qo of H is s = q, N qo. 
qi and qo are said to overlap simply if s = hy M he for some hy € qi, ha € qo. 


Figure 2 shows a hierarchical decomposition of a 3-hyperforest into superedges. Each (k — 1)-superedge 
contains a tree structure over the k-superedges that is a minor of the tree structure over all the hyperedges. In 
a K-hyperforest H, the K-superedges are the hyperedges, the 0-superedges are the connected components, 
and the single (—1)-superedge is the entire hyperforest H. 

We can now look at a K-hyperforest one level at a time. At level k, we consider only k-connectivity 
by focusing on the k-superedges in a (k — 1)-superedge. (< k)-connectivity does not exist in a (k — 1)- 
superedge*, and (> k)-connectivity is abstracted into the k-superedges. As a result, the tree structure over 
the k-superedges in a (k — 1)-superedge has the desirable property that all overlaps between adjacent k- 
superedges have the same size. 
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Figure 2: An example of the hierarchical decomposition of superedges. (a) depicts a hypergraph H, and (b) 
shows the tree structures of H at different levels. 


Lemma 6. The path in a tree structure T between two hyperedges in the same superedge q contains only 
hyperedges from gq (i.e. the hyperedges of a superedge appear contiguously in T). 


'Note that this is not the usual defi nition of k-connected. 
“Tn this way, a k-superedge q ‘functions’ as a k-hyperedge because all overlaps between g and any hyperedge not in q have size 
at most k. 


Proof: Since any overlap along the path in T’ between two hyperedges is a separator between them, and the 
two hyperedges are (kK + 1)-connected, all overlaps must be of size at least & + 1, so all hyperedges along 
the path are (k + 1)-connected. O 


Let us now show that a hierarchical decomposition such as the one in Figure 2 exists for all hyperforests. 
We do this by constructing, for each (K — 1)-superedge p, a tree structure over the k-superedges of p. 


Definition 7. [f p is a (k — 1)-superedge, a tree structure Tg over a set of k-superedges Qp in p is a tree 
such that the following hold: 


1. Any two k-superedges in Q, overlap simply. 
2. For all paths qi,.--,4minTg,V1 <t<M,GU Nm C G- 


Theorem 8. A hypergraph H is a hyperforest iff for all k, for all (k — 1)-superedges p, there exists a tree 
structure over the set Qp of k-superedges of p. 


Proof. 

=>: From the tree structure Ty of H, we will construct a tree structure Tg over @,, which is a minor 
of Ty. Include (q1, q2) € Tg iff there is some edge (hi, ha) € TH with hy € qi and hg € qo. Call hy and 
hg the gateway hyperedges of the edge (q1, g2) € Tg. Now we have to show that Tg is a valid tree structure. 

To verify that Tg is actually a tree, we will show that a cycle qq, q1,--- Qn in Tg means there is a cycle 
in Ty. Let hy_1 € qj—1 and gi € q be the two gateway hyperedges of (q;-1, q;). By Lemma 6, the path 
from g; to h; in Ty contains only hyperedges in g;. Consider the sequence c = ho, g1,---,h1,92,---,ha,--- 
In any case, there are at least m > 3 distinct hyperedges, so c is a cycle in Ty. 

To verify that all superedges overlap simply, we will show that the overlap between qi,gm € Q is 
contained in gateway hyperedges of q; and qj. Let qi, ..., Gm be the path in Tg, and hy € q, and hm € dm 
be gateway hyperedges of (q1, g2) and (¢m—1, Ym), respectively. Both h; and h,, must be on the path in Ty 
from any hi, € qi toany hi, € qm. Because Ty is atree structure, hi NRi,, C hiNhm, 80 GiNGm C hiNhm. 

To verify the path overlap property, we invoke the path overlap property of T'y and notice that the 
overlaps along the path between two k-superedges q1,9@m € @ are included in the overlaps along a corre- 
sponding path in Ty. Let s be the overlap between q; and g, (s = hi Nhm for some hy € qi, hm € Gm). 
The path from h; to h,, in Ty contains the gateway hyperedges of each (q;,q;4+1). Since s is contained in 
every hyperedge in the path hy,..., /,,, (in particular, the gateway hyperedges), s is also contained in every 
superedge in the path qi,...,@m.- 

<=: To show that H is a hyperforest, we will construct a tree structure Ty over H. To do this, we 
assume by induction that there is a tree structure T, over each (k + 1)-superedge T, of H and then show 
that there is a tree structure T,, over each k-superedge p of H. When k = —1, p = H is the single (—1)- 
superedge, and we have the desired tree structure Ty = Tp. 


Base case (k = K): Each k-superedge is a single hyperedge and has a trivial tree structure. 


Inductive case (k < K): Assume that there is a tree structure Ty over each (k + 1)-superedge q of H. 
Fix any k-superedge p. p is partitioned into a set Q of (kK + 1)-superedges, each of which has a tree 
structure by the induction hypothesis. Furthermore, there is a tree structure Tg over @. We now 
construct T,. T, includes UgegTy plus an edge (hg, , hg.) for each (qi, g2) € Tg, where hg, € qi and 
hqz © G2 are chosen to be any hyperedges containing q1 1 qo. It is clear that T, is a valid tree structure. 


O 


>Recall that a (simple) cycle is a closed path of at least three vertices, all distinct. To clarify notation, if we say go, 41, --- 54m 
is acycle, go = dm and m > 3. 


as 


3.2 Hypercycles 


A tree structure proves the acyclicity of a hypergraph, whereas a hypercycle proves the non-acyclicity of a 
hypergraph. 


Definition 9. A k-hypercycle is one of the following: 


1. Two k-superedges that overlap non-simply. Call this a hyperdoublet. 

2. A sequence C = 41, 92;--- 54m (M = 3) of distinct k-superedges with distinct overlaps 8; = 4 +1 
of size exactly k, such that there exists some s; that does not contain 8, = 4 1 Gm. If |8«| = k, call c 
a regular hypercycle. Otherwise, call c a hyperloop. 


A regular 1-hypercycle is exactly a simple cycle: overlaps between edges of the cycle are the distinct 
vertices along it, while the overlap s, is the vertex between the “first” and “last” edge. In higher order 
cycles, we cannot always require that the overlap s, that “closes” the cycle is also of size exactly k, e.g. (c) 
in Figure 3. However, it is not enough to require that the overlap s, be non-empty (e.g. (d) in the Figure): 
to be a cycle a path much “touch” itself “outside” of the common overlap in the path. 


k=1 


(d) not a hypercycle 
(b) hyperdoublets (c) a hyperloop 


Figure 3: Examples of hypercycles. 


Before introducing the main theorem of this section, we state the following lemma which facilitates 
discussing hypergraphs without hypercycles: 


Definition 10. We say that a sequence of overlaps §1,..., 8 is block-distinct if all repeated overlaps occur 
next to each other (if 8; = sj, thenVi <k <j, 8; = 8% = $j). 


Lemma 11. A hypergraph has no k-hypercycles iff both of the following conditions are true: 


1, All k-superedges overlap simply. 
2. For all sequences q1,42,---;Qm (m > 3) of block-distinct k-superedges with distinct overlaps 8; = 
GO G1 Of size exactly k, 8. = GM Gm is contained in every 8}. 


Proof: The difference between this lemma and the definition of a hypercycle is that in condition 2, we 
require block-distinct, whereas in part 2 of Definition 9, we say distinct. Before we show that distinct and 
block-distinct are interchangeable, we note that in condition 2 of Lemma 11, Vz, s, C s; can be equivalently 
replaced by 3, s, C q;: suppose that s, C gq; for some 7. Then s, C qi Mq; and sy C qj Gm. By induction 
on the two halves of g1,..., 9m, Sx C QV. 


Now, for each sequence c = q,.--, Gm With block-distinct overlaps, we construct a subsequence c’ with 
distinct overlaps by replacing each maximal contiguous subsequence q;,..., qj in c with identical overlaps 
along qj,..., qj with just q; and q;. The overlaps along c’ are distinct and also contain s,. O 


Requiring lack of hypercycles resembles the definition of a tree structure over superedges (Definition 
7). The main difference between the two conditions is that the path overlap property requires agreeing on 
a global object, namely the tree structure, and only paths along along the tree structure must have the path 
overlap property. The independence from a specific tree structure, which will be crucial later in proving 
the correctness of the data structure, is achieved by requiring a uniform degree of connectivity between 
superedges. 


Theorem 12. H is a hyperforest iff H contains no hypercycle. 


Proof. 
==>: Let Ty be any tree structure of H. We will verify the two conditions in Lemma 11. 


1. To show that any two k-superedges q1, q2 overlap simply, let k’ be the largest value such that g, and 
go are in the same (k’ — 1)-superedge p. Let r1 be the k’-superedge that contains gi, and rg # r1 be 
the k’-superedge that contains g2. By Theorem 8, 7; and rz overlap simply, so g; and q2 also overlap 
simply. 


2. Let c = q1,---,Gm be a sequence of distinct k-superedges with distinct overlaps s; of size k. We 
want to show that for all 2, s,. C s;. Consider the path c’ in Tg which is the concatination of the paths 
in Tg between every qj, qi+1: C= Q1,-++592,-++>%m- Since all overlaps s; are of size k, all adjacent 
k-superedges in c’ in the sub-path q;,...,q;41 have the same overlap s;. Label each edge (q, 1°) along 
the path c’ in Tg, with the overlap 7. Then, for i  j, the set of edges from q; to gj+1 and the set 
of edges from q; to qj41 are disjoint because their labels are different (s; # s;). Of course, for all 2, 
the edges between q; and q;+1 are distinct because they form a simple path. Therefore, all edges in the 
path c’ are distinct, and c’ is a simple path. Moreover, Tg is a tree structure, so V1 <i < m, 84 C G. 


<=: For a hypergraph H with no hypercycles, we will show that H is a hyperforest by constructing a 
tree structure T), over the set of k-superedges Q, of each (k — 1)-superedge p (Theorem 8). For each covered 
hyperedge (subset of a hyperedge) s of size exactly k, consider the k-superedges q1,q2,--- ,Gm € Qp that 
contain it. Note that since there are (k — 1)-superedges in a k-superedge, their overlap is of size exactly k, 
and is therefore exactly s. Choose one of these, say q1, arbitrarily and connect all remaining qgo,... , Gd; to 
qi in Ty Ge. Vici<m(q1, 4) € Tp). 

To show that J; is a tree, label each edge in J), with its corresponding overlap. By construction of 
T>, for each label, there is a single k-superedge (q; in the above notation) that is incident to all edges so 
labeled. Therefore, every simple path must contain at most two edges of the same label, and if there are 
two such edges in the path, they must be adjacent. Suppose for contradiction that there is a simple cycle 
C= 90, 913--+3%m in Ty with s; = qi 1 G41 for 0 < i < m, such that s9 F 81. q1,.-.,Gm iS a sequence of 
distinct k-superedges with block-distinct overlaps. Due to condition 2 of Lemma 11, s9 C s1. But this is a 
contradiction since both overlaps are of size k. Thus, c could not have existed, and T7, is indeed a tree. 

The simplicity of the overlaps and the path overlap property in T, now follow immediately from the 
definition of a hypercycle. 

O 


QUERY (hnew): 
compute H,,, and Z, INSERT (Anew): 
for k < K downto | do 1 assert QUERY (hnew) 
for s,t € S;, do 2 fork «+ 1to K do 


return FALSE 4 union (s,t) in U; 


1 
2 
3 
4 if U,(s,t) and not Z;,(s,t) then 3 for s,t € S;, do 
5 
6 return TRUE 


Figure 4: Pseudocode for QUERY and INSERT. S; denotes the set of k-supervertices in Hp,,..,- 


4 The Data Structure 


We use ordinary Union-Find structures at each of the K levels. At level k, a Union-Find structure U; keeps 
track of disjoint sets of connected k-supervertices. 


Definition 13 (Supervertex). A k-supervertex (same as a covered (k — 1)-hyperedge) of a hypergraph H 
is a k-subset of some hyperedge h € H. Two k-supervertices s 1 and s2 are k-connected (or just connected) 
if there exists a (k — 1)-superedge q of H such that 81, 82 C @. 


4.1 Overview of the data structure state 


We maintain the following two values with respect to the current hyperforest H: 


e The set H of all covered hyperedges in H, equivalent to the supervertices of H. 
e The transitive relation U; for each k < K, where U;(s,t) specifies whether the two k-supervertices 
s and ¢ are connected. 


In addition, in each QUERY operation, we compute two additional values that are dependent on hyew. 
These two values are projections of H and U, onto hnew. 


e The set Hy,.,, which is the projection of H onto hnew- 
e The relation Z, for each k < K, where Z,(s,t) specifies whether two k-supervertices s and t are 
connected in Hy,,..,. 


4.2 The QUERY and INSERT operations 


QUERY (Anew) returns TRUE iff for some k, there exists two k-supervertices s and ¢ such that s and ¢ are 
connected in H but not connected in Hp,,,,,. If QUERY (Anew) returns TRUE, INSERT(Anew) connects all 
k-supervertices of hnew in U;,. See Figure 4 for pseudocode. 

Roughly speaking, QUERY returns FALSE when s and ¢ are connected “outside” of Rnew. In this case, 
adding hpew would close a hypercycle. It is not enough to merely require that s and ¢ are connected in H, 
since if they are also connected to the same extent inside hpew, Anew might “collapse” into the path between 
sand t. 


4.3 Correctness 


We now show that QUERY(hnew) returns TRUE iff H’ = H U {Rnew} is a hyperforest*. Let H’ = 
H U {hnew } be the supervertices of the augmented hyperforest. 


Theorem 14. /f INSERT(hpew) returns TRUE, then H' is a hyperforest. 


Proof. From any tree structure Ty of H, we will construct a tree structure Ty for H’. The idea is to break 
up Ty into the subtrees T;(H1),...,Im(Hm) that hnew separates, and then connect each subtree to hnew 
via some h; € Tj. 

Specifically, to get Ty, remove each edge (ha, hy) in Ty such that ha N hyp C Rnew. We claim that 
letting hy = argmaxpe 7, |h’ 1 Rnew| (the hyperedge that overlaps with hyew the most) creates a valid tree 
structure. It is enough to verify that the path overlap property holds for paths in Ty involving Anew: both 
paths hg,... , hnew,--- , hy passing through hyew and paths in which hyew is an endpoint. 

Paths passing through hypew connect ha, hy from different subtrees, and so the path from h, to hg in Ty 
must have contained a removed edge, and hg N hy C hnew- 

For paths where hnew is an endpoint, we show the path overlap property by showing that the sequence of 
overlaps between hynew and every hyperedge along the path from hg to hnew in Ty increases telescopically. 
To do this, we must argue that (1) the overlaps form a subset relation, and that (2) there are no “local 
minimum” overlaps where both the preceding and following overlaps are proper supersets of an overlap. 
Formally, we verify two properties:> 


1. For any (hg, he) € Ty, if ha Nhe F new, then either hy C hi, or hi, C hy. Otherwise, there would 
be two k-supervertices in hg and hy connected “outside of” H. 

2. For any path ha, hy, he in Ty such that hgNhy Z Anew and hehe Z Mnew, then |h,| > min{|hZ|, |he|}- 
Otherwise, there would be two k-supervertices in hg and h, connected “outside of” H. 


O 
Theorem 15. /f QUERY (hnew) returns FALSE, then H' is not a hyperforest. 


Proof. We will show how to construct a hypercycle using the two k-supervertices s,t that are k-connected 
“outside” Anew. We will construct a hypercycle that includes hnew and the k-connected path between s 
and t. Let qi,...,@m be the k-connected path in the tree structure Tg over the k-superedges @, of the 
(k — 1)-superedge p containing s and t. 

Intuitively, h,@1,.--,@m “forms a hypercycle,”’ but hney may collapse any of the g;’s into one k- 
superedge. So we extract out a subpath of q1,...,@, such that hyew collapses at most the terminals q1, dm. 

For each 2, let u; be a maximum overlap between hyney and a hyperedge in the k-supervertex q; along 
the path. We also require |u,;| > k; otherwise, we say that u; does not exist. We choose u, and u,,, so that 
u1 D s and ty, D t. See Figure 5 for an illustration. 

Extracting the subpath proceeds in two steps: 


1. Choose a subpath for which all overlaps between adjacent k-superedges on this subpath are not con- 
tained in hnew. This is possible because s and ¢ are not k-connected in Hp,,,,. 

2. Choose a subpath q;,...,q; from the resulting subpath of the first step such that the u,’s do not exist 
for? <r < J, but uj, uj do exist. 

“Due to space limitations we provide only proof outlines—see [LS03] for full proofs. 


>For notational convenience, denote the overlap of Anew and a hyperedge as the hyperedge with an added prime. For instance, 
hi, = ha ON hnew- 


Figure 5: Construction of a hypercycle in Theorem 15. 


Now, consider the k-superedges c’ = qj,...,q) in H’ after Anew optionally collapses the terminal su- 
peredges. All overlaps in this sequence are block-distinct and of size k. If 1’ > 3, c’ is a regular hypercycle. 
If J’ = 2, c’ is a hyperdoublet. If J’ = 1, we have to delve deeper and look at the k’-superedges of gj, for the 
smallest k' > k. There are at least two k’-superedges that result. If there are two, we have a hyperdoublet. 
Otherwise, we have a hyperloop around the overlap of size k. 

Therefore, H’ is not a hyperforest. O 


4.4 Time complexity 


In QUERY, we compute Hy, by iterating through all subsets of hnew and selecting the ones that are in H. 
We compute Z;, by iterating through all h’ € Hp,,,,, and unifying all pairs of k-subsets of h’. There are less 
than 4!"ew! pairs of k-subsets in hyew, and so computing Z, requires no more than O(K4*) time. 

Each U; can be implemented as an ordinary Union-Find structure. The number of supervertices stored 
in these Union-Find structures is bounded by the maximal number of covered hyperedges in a hyperforest of 
width K, which is less than |V|2**+1. The amortized time of each call to Find is then O(a(|V|2* ), where 
a is the inverse Ackermann function. Each QUERY operations calls Find once for each pair of k-subsets in 
hnew, yielding a combined amortized run-time of O(4* (K + a(|V|2*))) per QUERY operation. 

Each INSERT operation calls Union a similar number of times, its amortized run-time is also O(4*% (K + 


a(|V|2*))). 


5 Experiments with Greedy Hypertrees 


Given as input weights on candidate hyperedges, the weight of a hyperforest is equal to the sum of the 
weights of all hyperedges it covers. In the K-maximum hypertree problem, we would like to find the K- 
hyperforest of maximum weight. When K > 1 the problem is NP-hard [Sre00]. Figure 6 provides an 
example where greedy approaches perform suboptimally. 


Figure 6: An example hypergraph on which a greedy algorithm would capture asymptotically none of the 
weight of the optimal hypertree. 


The common greedy heuristic for constructing a high-weight hyperforest is Prim-like: we start with the 
highest-weight hyperedge, and at each iteration consider only candidate hyperedges of the form s U {v}, 
where s C h € H isa subset of size exactly K of a hyperedge of the “current” hyperforest, and v is a new 
vertex not yet in the hyperforest. The heaviest® such hyperedge is added to the hyperforest, which remains 


°Tf weights are specifi ed also for non-maximal hyperedges, then when considering the weight of a hyperedge, the weights of all 
its sub-hyperedges not already covered by H are added to it. 
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fully A-connected. 

Using the data structure described in this paper, one can consider a more flexible Kruskal-like greedy 
procedure: at each iteration, all hyperedges which do not cause hypercycles are considered, and the heaviest 
one is added to the hyperforest. 

To demonstrate the possible utility of a Kruskal-like greedy procedure, as compared to a Prim-like 
greedy procedure, we generated random weights on all candidate 2-hyperedges in a hypergraph with 100 
vertices in the following way: we first constructed a random “planted” 2-hypertree by augmenting a hy- 
perforest randomly. Hyperedges outside the planted hypertree were assigned a random weight uniformly 
distributed between 0 and 1. In one set of experiments, weights inside the hypertree were assigned random 
weights uniformly distributed between 0 and 10. In the second set, the weights were chosen uniformly 
between 0 and | with probability 1/2, and between 0 and 20 with probability 1/2. We generated 10 ran- 
dom weight-sets of each type, and tried both greedy approaches on each graph. Table 1 summarizes the 
weights of the resulting hypertrees. Kruskal performed significantly better on both sets of experiments, and 
especially when the weight was less evenly distributed in the “planted’ hypertree. 


eS 0 Ee 0) 


Planted 0.590 + 0.0339 0.609 + 0.059 


Prim-like 0.506 + 0.0816 0.323 + 0.107 
Kruskal-like | 0.587 + 0.0342 0.619 + 0.058 


Table 1: Averages and standard deviations of fraction of the weight captured by the hypertrees: the planted 
hypertree, the hypertree recovered with a Prim-like greedy approach, and the one recovered with a Kruskal- 
like greedy approach. 


6 Conclusion 


We have presented a dynamic data structure for keeping track of acyclicity in hypergraphs, allowing aug- 
menting a hyperforest while ensuring new hyperedges to not break its acyclicity. Each operation takes time 
which is almost constant in the size of the hyperforest but, like most hyperforest algorithms, is exponential 
in the tree-width. Although an exponential dependence is probably unavoidable, it might well be possible 
to reduce the precise dependence. 

The new dynamic data structure allows efficient implementation of Kruskal-like greedy heuristics for 
finding high-weight hyperforests that have some advantages over Prim-like heuristics. However, the im- 
portant problem of constructing efficient algorithms that approximate well the maximum-weight hypertree 
remains open. 
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References 


[Bes74] Julian Besag. Spatial interaction and the statistical analysis of lattice systems. Proceedings of 
the Royal Statistical Society, Series B, pages 192-236, 1974. 


[BFMY83] Catriel Beery, Ronald Fagin, David Maier, and Mihalis Yannakakis. On the desirability of 
acyclic database schemes. J of the ACM, 30(3):479-513, 1983. 


11 


[BJO2] 


[Bod96] 


[BP01] 


[CL68] 


[Cou90] 


[KSO1] 


[LS03] 


[Mal91] 


[Mat99] 


[SG97] 


[Sre00] 


[Sre01] 


[Tom86] 


[Wor82] 


F. R. Bach and M. I. Jordan. Thin junction trees. In T. G. Dietterich, S. Becker, and Z. Ghahra- 
mani, editors, Advances in Neural Information Processing Systems 14, pages 569-576, Cam- 
bridge, MA, 2002. MIT Press. 


Hans L. Bodlaender. A linear time algorithm for finding tree-decompositions of small treewidth. 
SIAM Journal on Computing, 25:1305-1317, 1996. 


Jozsef Bokszar and Andras Prekopa. Probability bounds with cherry trees. Mathematics of 
Operations Research, 26(1):174—-192, 2001. 


C. K. Chow and C. N. Liu. Approximating discrete probability distributions with dependence 
trees. IEEE Transactions on Information Theory, YT-14(3):462-467, 1968. 


B. Courcelle. The monadic second-order logic of graphs i: Recognizable sets of finite graphs. 
Information and Computation, 85:12—75, 1990. 


David Karger and Nathan Srebro. Learning Markov networks: Maximum bounded tree-width 
graphs. In Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms, 2001. 


Percy Liang and Nathan Srebro. A dynamic data structure for checking hyperacyclicity. Avail- 
able on theory.lcs.mit/~naits/HyperTrees, 2003. 


Francesco M. Malvestuto. Approximating discrete probability distributions with decomposable 
models. [EEE Transactions on Systems, Man and Cybernetics, 21(5):1287—1294, 1991. 


Nicholas Matsakis. Recognition of handwritten mathematical expressions. Master’s thesis, 
Massachusetts Institute of Technology, 1999. 


Kirill Shoikhet and Dan Geiger. A practical algorithm for finding optimal triangulations. In 
Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 185-190, 
1997. 


Nathan Srebro. Maximum likelihood Markov networks: An algorithmic approach. Master’s 
thesis, Massachusetts Institute of Technology, 2000. 


Nathan Srebro. Maximum likelihood bounded tree-width markov networks. In The 17th Con- 
ference on Uncertainty in Artificial Intelligence, 2001. 


Ioan Tomescu. Hypertrees and bonferroni inequalities. J. Combin. Theory Ser. B, 41:209-217, 
1986. 


K J Worsley. An improved Bonferroni inequality and applications. Biometrika, 69:297-302, 
1982. 


12 


